Pix2struct Widget Captioning Large
Apache-2.0
Pix2Struct is an image encoder-text decoder model designed for visual language understanding, supporting tasks such as image captioning and visual question answering.
Image-to-Text
Transformers Supports Multiple Languages